IEICE global.ieice.org Site

Author Search Result

[Author] Michitaka KAMEYAMA(65hit)

41-60hit(65hit)

Code Assignment Algorithm for Highly Parallel Multiple-Valued Combinational Circuits Based on Partition Theory
Saneaki TAMAKI Michitaka KAMEYAMA Tatsuo HIGUCHI

PAPER-Logic Design

Vol:
E76-D No:5
Page(s):
548-554
Design of locally computable combinational circuits is a very important subject to implement high-speed compact arithmetic and logic circuits in VLSI systems. This paper describes a multiple-valued code assignment algorithm for the locally computable combinational circuits, when a functional specification for a unary operation is given by the mapping relationship between input and output symbols. Partition theory usually used in the design of sequential circuits is effectively employed for the fast search for the code assignment problem. Based on the partition theory, mathematical foundation is derived for the locally computable circuit design. Moreover, for permutation operations, we propose an efficient code assignment algorithm based on closed chain sets to reduce the number of combinations in search procedure. Some examples are shown to demonstrate the usefulness of the algorithm.
A Bit-Serial Reconfigurable VLSI Based on a Multiple-Valued X-Net Data Transfer Scheme
Xu BAI Michitaka KAMEYAMA

PAPER-Computer System

Vol:
E96-D No:7
Page(s):
1449-1456
A multiple-valued data transfer scheme using X-net is proposed to realize a compact bit-serial reconfigurable VLSI (BS-RVLSI). In the multiple-valued data transfer scheme using X-net, two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. One cell composed of a logic block and a switch block is connected to four adjacent cross points by four one-bit switches so that the complexity of the switch block is reduced to 50% in comparison with the cell of a BS-RVLSI using an eight nearest-neighbor mesh network (8-NNM). In the logic block, threshold logic circuits are used to perform threshold operations, and then their binary dual-rail voltage outputs enter a binary logic module which can be programmed to realize an arbitrary two-variable binary function or a bit-serial adder. As a result, the configuration memory count and transistor count of the proposed multiple-valued cell are reduced to 34% and 58%, respectively, in comparison with those of an equivalent CMOS cell. Moreover, its power consumption for an arbitrary 2-variable binary function becomes 67% at 800 MHz under the condition of the same delay time.
Architecture of a Stereo Matching VLSI Processor Based on Hierarchically Parallel Memory Access
Masanori HARIYAMA Haruka SASAKI Michitaka KAMEYAMA

PAPER-Digital Circuits and Computer Arithmetic

Vol:
E88-D No:7
Page(s):
1486-1491
This paper presents a VLSI processor for high-speed and reliable stereo matching based on adaptive window-size control of SAD(Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using multi-resolution images. Parallel memory access is essential for highly parallel image processing. For parallel memory access, this paper also presents an optimal memory allocation that minimizes the hardware amount under the condition of parallel memory access at specified resolutions.
Memory Allocation for Multi-Resolution Image Processing
Yasuhiro KOBAYASHI Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-VLSI Systems

Vol:
E91-D No:10
Page(s):
2386-2397
Hierarchical approaches using multi-resolution images are well-known techniques to reduce the computational amount without degrading quality. One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. The complexity of the interconnection network mainly depends on memory allocation; it maps pixels onto memory modules and determines the required number of memory modules. This paper presents a memory allocation method to minimize the number of memory modules for image processing using multi-resolution images. For efficient search, the proposed method exploits the regularity of window-type image processing. A practical example demonstrates that the number of memory modules is reduced to less than 14% that of conventional methods.
A Collision Detection Processor for Intelligent Vehicles
Masanori HARIYAMA Michitaka KAMEYAMA

PAPER

Vol:
E76-C No:12
Page(s):
1804-1811
Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.
Multiple-Valued Logic-in-Memory VLSI Architecture Based on Floating-Gate-MOS Pass-Transistor Logic
Takahiro HANYU Michitaka KAMEYAMA

PAPER-Non-Binary Architectures

Vol:
E82-C No:9
Page(s):
1662-1668
A new logic-in-memory VLSI architecture based on multiple-valued floating-gate-MOS pass-transistor logic is proposed to solve the communication bottleneck between memory and logic modules. Multiple-valued stored data are represented by the threshold voltage of a floating-gate MOS transistor, so that a single floating-gate MOS transistor is effectively employed to merge multiple-valued threshold-literal and pass-switch functions. As an application, a four-valued logic-in-memory VLSI for high-speed pattern recognition is also presented. The proposed VLSI detects a stored reference word with the minimum Manhattan distance between a 16-bit input word and 16-bit stored reference words. The effective chip area, the switching delay and the power dissipation of a new four-valued full adder, which is a key component of the proposed logic-in-memory VLSI, are reduced to about 33 percent, 67 percent and 24 percent, respectively, in comparison with those of the corresponding binary CMOS implementation under a 0.5-µm flash EEPROM technology.
A Multiple-Valued Reconfigurable VLSI Architecture Using Binary-Controlled Differential-Pair Circuits
Xu BAI Michitaka KAMEYAMA

PAPER-Integrated Electronics

Vol:
E96-C No:8
Page(s):
1083-1093
This paper presents a fine-grain bit-serial reconfigurable VLSI architecture using multiple-valued switch blocks and binary logic modules. Multiple-valued signaling is utilized to implement a compact switch block. A binary-controlled current-steering technique is introduced, utilizing a programmable three-level differential-pair circuit to implement a high-performance low-power arbitrary two-variable binary function, and increase the noise margins in comparison with the quaternary-controlled differential-pair circuit. A current-source sharing technique between a series-gating differential-pair circuit and a current-mode D-latch is proposed to reduce the current source count and improve the speed. It is demonstrated that the power consumption and the delay of the proposed multiple-valued cell based on the binary-controlled current-steering technique and the current-source-sharing technique are reduced to 63% and 72%, respectively, in comparison with those of a previous multiple-valued cell.
An FPGA-Oriented Motion-Stereo Processor with a Simple Interconnection Network for Parallel Memory Access
Seunghwan LEE Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Image Processing, Image Pattern Recognition

Vol:
E83-D No:12
Page(s):
2122-2130
In designing a field-programmable gate array (FPGA)-based processor for motion stereo, a parallel memory system and a simple interconnection network for parallel data transfer are essential for parallel image processing. This paper, firstly, presents an FPGA-oriented hierarchical memory system. To reduce the bandwidth requirement between an on-chip memory in an FPGA and external memories, we propose an efficient scheduling: Once pixels are transferred to the on-chip memory, operations associated with the data are consecutively performed. Secondly, a rectangular memory allocation is proposed which allocates pixels to be accessed in parallel onto different memory modules of the on-chip memory. Consequently, completely parallel access can be achieved. The memory allocation also minimizes the required capacity of the on-chip memory and thus is suitable for FPGA-based implementation. Finally, a functional unit allocation is proposed to minimize the complexity between memory modules and functional units. An experimental result shows that the performance of the processor becomes 96 times higher than that of a 400 MHz Pentium II.
Implementation of a Low-Power FPGA Based on Synchronous/Asynchronous Hybrid Architecture
Shota ISHIHARA Ryoto TSUCHIYA Yoshiya KOMATSU Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Electronic Circuits

Vol:
E94-C No:10
Page(s):
1669-1679
This paper presents a low-power FPGA based on mixed synchronous/asynchronous design. The proposed FPGA consists of several sections which consist of logic blocks, and each section can be used as either a synchronous circuit or an asynchronous circuit according to its workload. An asynchronous circuit is power-efficient for a low-workload section since it does not require the clock tree which always consumes the power. On the other hand, a synchronous circuit is power-efficient for a high-workload section because of its simple hardware. The major consideration is designing an area-efficient synchronous/asynchronous hybrid logic block. This is because the hardware amount of the asynchronous circuit is about double that of the synchronous circuit, and the typical implementation wastes half of the hardware in synchronous mode. To solve this problem, we propose a hybrid logic block that can be used as either a single asynchronous logic block or two synchronous logic blocks. The proposed FPGA is fabricated using a 65-nm CMOS process. When the workload of a section is below 22%, asynchronous mode is more power-efficient than synchronous mode. Otherwise synchronous mode is more power-efficient.
A VLSI-Oriented Digital Signal Processor Based on Pulse-Train Residue Arithmetic Circuit with a Multiplier
Michitaka KAMEYAMA Oluwole ADEGBENRO Tatsuo HIGUCHI

PAPER-Signal Processing

Vol:
E68-E No:1
Page(s):
14-21
This paper proposes a new residue number multiplication scheme based on the cylic type of relationship which exists between the entries in the residue number multiplication truth-table when the modulus is any prime number. Using the scheme, multiplication is direct without table consultation and an entire truth-table is realizable. The multiplier circuit is simple and compact and allows pipelined processing of data. The flexibility of the multiplier is exploited in the implementation of an RNS based high-order FIR digital filter by using a programmable low order section. The suitability of the modular digital processor for VLSI is also indicated.
Multiple-Valued Fine-Grain Reconfigurable VLSI Using a Global Tree Local X-Net Network
Xu BAI Michitaka KAMEYAMA

PAPER-VLSI Architecture

Vol:
E97-D No:9
Page(s):
2278-2285
A global tree local X-net network (GTLX) is introduced to realize high-performance data transfer in a multiple-valued fine-grain reconfigurable VLSI (MVFG-RVLSI). A global pipelined tree network is utilized to realize high-performance long-distance bit-parallel data transfer. Moreover, a logic-in-memory architecture is employed for solving data transfer bottleneck between a block data memory and a cell. A local X-net network is utilized to realize simple interconnections and compact switch blocks for eight-near neighborhood data transfer. Moreover, multiple-valued signaling is utilized to improve the utilization of the X-net network, where two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. To evaluate the MVFG-RVLSI, a fast Fourier transform (FFT) operation is mapped onto a previous MVFG-RVLSI using only the X-net network and the MVFG-RVLSI using the GTLX. As a result, the computation time, the power consumption and the transistor count of the MVFG-RVLSI using the GTLX are reduced by 25%, 36% and 56%, respectively, in comparison with those of the MVFG-RVLSI using only the X-net network.
Design and Implementation of a Low-Power Multiple-Valued Current-Mode Integrated Circuit with Current-Source Control
Takahiro HANYU Satoshi KAZAMA Michitaka KAMEYAMA

PAPER-Multiple-Valued Architectures

Vol:
E80-C No:7
Page(s):
941-947
A new multiple-valued current-mode (MVCM) integrated circuit using a switched current-source control technique is proposed for a 1.5 V-supply high-speed arithmetic circuit with low-power dissipation. The use of a differential logic circuit (DLC) with a pair of dual-rail inputs makes the input voltage swing small, which results in a high driving capability at a lower supply voltage, while having large static power dissipation. In the proposed DLC using a switched current control technique, the static power dissipation can be greatly reduced because current sources in non-active circuit blocks are turned off. Since the gate of each current source is directly controlled by using a multiphase clock whose technique has been already used in dynamic circuit design, no additional transistors are required for currentsource control. As a typical example of arithmetic circuits, a new 1.5 V-supply 5454-bit multiplier based on a 0.8µm standard CMOS technology is also designed. Its performance is about 1.3 times faster than that of a binary fastest multiplier under the normalized power dissipation. A prototype chip is also fabricated to confirm the basic operation of the proposed MVCM integrated circuit.
A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme
Seunghwan LEE Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Integrated Electronics

Vol:
E80-C No:11
Page(s):
1491-1498
Three-dimensional (3-D) instrumentation using an image sequence is a promising instrumentation method for intelligent systems in which accurate 3-D information is required. However, real-time instrumentation is difficult since much computation time and a large memory bandwidth are required. In this paper, a 3-D instrumentation VLSI processor with a concurrent memory-access scheme is proposed. To reduce the access time, frequently used data are stored in a cache register array and are concurrently transferred to processing elements using simple interconnections to the 8-nearest neighbor registers. Based on a row and column memory access pattern, we propose a diagonally interleaved frame memory by which pixel values of a row and column are stored across memory modules. Based on the concurrent memory-access scheme, a 40 GOPS vprocessor is designed and the delay time for the instrumentation is estimated to be 42 ms for a 256256 images.
Parallel VLSI Processors for Robotics Using Multiple Bus Interconnection Networks
Bumchul KIM Michitaka KAMEYAMA Tatsuo HIGUCHI

PAPER-Robot Electronics

Vol:
E75-A No:6
Page(s):
712-719
This paper proposes parallel VLSI processors for robotics based on multiple processing elements organized around multiple bus interconnection networks. The advantages of multiple bus interconnection networks are generality, simplicity of implementation and capability of parallel communications between processing elements, therefore it is considered to be suitable for parallel VLSI systems. We also propose the optimal scheduling formulated in an integer programming problem to minimize the delay time of the parallel VLSI processors.
FOREWORD
Michitaka KAMEYAMA

FOREWORD

Vol:
E90-C No:10
Page(s):
1849-1849
Field-Programmable VLSI Based on a Bit-Serial Fine-Grain Architecture
Masanori HARIYAMA Weisheng CHONG Michitaka KAMEYAMA

PAPER

Vol:
E87-C No:11
Page(s):
1897-1902
This paper presents a novel architecture to solve two problems of existing FPGAs : the large delay and area due to complex programmable switch blocks, and the large area due to coarse-grain logic blocks that are underutilized to a great degree. A mesh-connected cellular array based on a bit-serial pipeline architecture is introduced to minimize complexity of switch blocks. A fine-grain logic block architecture with a functionality of a bit-serial adder is presented to minimize the number of inputs and outputs of the logic block since increase in the number of inputs and outputs directly increases the complexity of a switch block. For an area-efficient design, the logic block is implemented based on a hybrid of a programmable logic gate and a dedicated carry logic. The hybrid architecture allows us to use a small lookup table to implement the logic gate. Moreover, the carry logic uses a functional pass-gate that merges both logic and storage functions compactly. The performance of the fine-grain field-programmable VLSI (FPVLSI) is evaluated to be more than 2 times higher than that of a coarse-grain FPVLSI.
FOREWORD Open Access
Michitaka KAMEYAMA

FOREWORD

Vol:
E93-D No:8
Page(s):
2025-2025
An Asynchronous FPGA Based on LEDR/4-Phase-Dual-Rail Hybrid Architecture
Shota ISHIHARA Yoshiya KOMATSU Masanori HARIYAMA Michitaka KAMEYAMA

PAPER-Electronic Circuits

Vol:
E93-C No:8
Page(s):
1338-1348
This paper presents an asynchronous FPGA that combines 4-phase dual-rail encoding and LEDR (Level-Encoded Dual-Rail) encoding. 4-phase dual-rail encoding is employed to achieve small area and low power for function units, while LEDR encoding is employed to achieve high throughput and low power for the data transfer using programmable interconnection resources. Area-efficient protocol converters and their control circuits are also proposed in transistor-level implementation. The proposed FPGA is designed using the e-Shuttle 65nm CMOS process. Compared to the 4-phase-dual-rail-based FPGA, the throughput is increased by 69% with almost the same transistor count. Compared to the LEDR-based FPGA, the transistor count is reduced by 47% with almost the same throughput. In terms of power consumption, the proposed FPGA achieves the lowest power compared to the 4-phase-dual-rail-based and the LEDR-based FPGAs. Compared to the synchronous FPGA, the proposed FPGA has lower power consumption when the workload is below 35%.
A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition
Yoshifumi SASAKI Michitaka KAMEYAMA

PAPER

Vol:
E77-C No:7
Page(s):
1116-1122
In robot vision system, enormously large computation power is required to perform three-dimensional (3-D) instrumentation and object recognition. However, many kinds of complex and irregular operations are required to make accurate 3-D instrumentation and object recognition in the conventional method for software implementation. In this paper, a VLSI-oriented Model-Based Robot Vision (MBRV) processor is proposed for high-speed and accurate 3-D instrumentation and object recognition. An input image is compared with two-dimensional (2-D) silhouette images which are generated from the 3-D object models by means of perspective projection. Because the MBRV algorithm always gives the candidates for the accurate 3-D instrumentation and object recognition result with simple and regular procedures, it is suitable for the implementation of the VLSI processor. Highly parallel architecture is employed in the VLSI processor to reduce the latency between the image acquisition and the output generation of the 3-D instrumentation and object recognition results. As a result, 3-D instrumentation and object recognition can be performed 10000 times faster than a 28.5 MIPS workstation.
Unified Scheduling of High Performance Parallel VLSI Processors for Robotics
Bumchul KIM Michitaka KAMEYAMA Tatsuo HIGUCHI

PAPER-Parallel Processor Scheduling

Vol:
E76-A No:6
Page(s):
904-910
The performance of processing elements can be improved by the progress of VLSI circuit technology, while the communication overhead can not be negligible in parallel processing system. This paper presents a unified scheduling that allocates tasks having different task processing times in multiple processing elements. The objective function is formulated to measure communication time between processing elements. By employing constraint conditions, the scheduling efficiently generates an optimal solution using an integer programming so that minimum communication time can be achieved. We also propose a VLSI processor for robotics whose latency is very small. In the VLSI processor, the data transfer between two processing elements can be done very quickly, so that the communication cycle time is greatly reduced.

41-60hit(65hit)

Author Search Result

[Author] Michitaka KAMEYAMA(65hit)

Code Assignment Algorithm for Highly Parallel Multiple-Valued Combinational Circuits Based on Partition Theory

A Bit-Serial Reconfigurable VLSI Based on a Multiple-Valued X-Net Data Transfer Scheme

Architecture of a Stereo Matching VLSI Processor Based on Hierarchically Parallel Memory Access

Memory Allocation for Multi-Resolution Image Processing

A Collision Detection Processor for Intelligent Vehicles

Multiple-Valued Logic-in-Memory VLSI Architecture Based on Floating-Gate-MOS Pass-Transistor Logic

A Multiple-Valued Reconfigurable VLSI Architecture Using Binary-Controlled Differential-Pair Circuits

An FPGA-Oriented Motion-Stereo Processor with a Simple Interconnection Network for Parallel Memory Access

Implementation of a Low-Power FPGA Based on Synchronous/Asynchronous Hybrid Architecture

A VLSI-Oriented Digital Signal Processor Based on Pulse-Train Residue Arithmetic Circuit with a Multiplier

Multiple-Valued Fine-Grain Reconfigurable VLSI Using a Global Tree Local X-Net Network

Design and Implementation of a Low-Power Multiple-Valued Current-Mode Integrated Circuit with Current-Source Control

A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme

Parallel VLSI Processors for Robotics Using Multiple Bus Interconnection Networks

FOREWORD

Field-Programmable VLSI Based on a Bit-Serial Fine-Grain Architecture

FOREWORD Open Access

An Asynchronous FPGA Based on LEDR/4-Phase-Dual-Rail Hybrid Architecture

A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition

Unified Scheduling of High Performance Parallel VLSI Processors for Robotics

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles